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Abstract — We consider the level of information security pro- 
vided by random linear network coding in network scenarios in 
which all nodes comply with the communication protocols yet are 
assumed to be potential eavesdroppers (i.e. "nice but curious"). 
For this setup, which differs from wiretapping scenarios consid- 
ered previously, we develop a natural algebraic security criterion, 
and prove several of its key properties. A preliminary analysis 
of the impact of network topology on the overall network coding 
security, in particular for complete directed acyclic graphs, is 
also included. 
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I. Introduction 

Under the classical networking paradigm, in which inter- 
mediate nodes are only allowed to store and forward packets, 
information security is usually viewed as an independent 
feature with little or no relation to other communication tasks. 
In fact, since intermediate nodes receive exact copies of the 
sent packets, data confidentiality is commonly ensured by 
cryptographic means at higher layers of the protocol stack. 
Breaking with the ruling paradigm, network coding allows 
intermediate nodes to mix information from different data 
flows [1], [2] and thus provides an intrinsic level of data 
security — arguably one of the least well understood benefits 
of network coding. 

Previous work on this issue has been mostly concerned 
with constructing codes capable of spliting the data among 
different links, such that reconstruction by a wiretapper is 
either very difficult or impossible. In [3], the authors present 
a secure linear network code that achieves perfect secrecy 
against an attacker with access to a limited number of links. 
A similar problem is considered in [4], featuring a random 
coding approach in which only the input vector is modi- 
fied. [5] introduces a different information-theoretic security 
model, in which a system is deemed to be secure if an 
eavesdropper is unable to get any decoded or decodable (also 
called meaningful) source data. Still focusing on wiretapping 
attacks, [6] provides a simple security protocol exploiting the 
network topology: an attacker is shown to be unable to get any 
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Fig. 1. Canonical Network Coding Example. In this image, intermediate nodes 
are represented with squares. With this code, node 4 is a vulnerability for 
the network since it can decode all the information sent through it. Note 
that the complete opposite happens for node 5, that receives no meaningful 
information whatsoever 



meaningful information unless it can access those links that 
are necessary for the communication between the legitimate 
sender and the receiver, who are assumed to be using network 
coding. As a distributed capacity-achieving approach for the 
multicast case, randomized network coding [7], [8] has been 
shown to extend naturally to packet networks with losses [9] 
and Byzantine modifications (both detection and correction 
[10], [11], [12], [13]). [14] adds a cost criterion to the secure 
network coding problem, providing heuristic solutions for a 
coding scheme that minimizes both the network cost and 
the probability that the wiretapper is able to retrieve all the 
messages of interest. 

In this work, we approach network coding security from 
a different angle: our focus is not on the threat posed by 
external wiretappers but on the more general threat posed 
by intermediate nodes. We assume that the network consists 
entirely of "nice but curious" nodes, i.e. they comply with 
the communication protocols (in that sense, they are well- 
behaved) but may try to acquire as much information as 
possible from the data that passes through them (in which case, 
they are potentially malicious). This notion is highlighted in 
the following example. 

Example 1: Consider the canonical network coding exam- 
ple with 7 nodes, shown in Figure |7] Node 1 sends a flow 
to sinks 6 and 7 through intermediate nodes 2, 3, 4 and 5. 
From the point of security, we can distinguish between three 
types of intermediate nodes in this setting: (1) those that only 
get a non-meaningful part of the information, such as node 
5; (2) those that obtain all of the information, such as node 
4; and (3) those that get partial yet meaningful information, 
such as nodes 2 and 3. Although this network code could be 
considered secure against single-edge external wiretapping — 
i.e. , the wiretapper is not able to retrieve the whole data simply 
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by eavesdropping on a single edge — it is clearly insecure 
against internal eavesdropping by an intermediate node. 

Motivated by this example, we set out to investigate the 
security potential of network coding. Our main contributions 
are as follows: 

• Problem Formulation: We formulate a secure network 
coding problem, in which all intermediate nodes are 
viewed as potential eavesdroppers and the goal is to 
characterize the intrinsic level of security provided by 
random linear network coding. 

• Algebraic Security Criterion: Based on the notion that the 
number of decodable bits available to each intermediate 
node is limited by the degrees of freedom it receives, we 
are able to provide a natural secrecy constraint for net- 
work coding and to prove some of its most fundamental 
properties. 

• Security Analysis for Complete Directed Acyclic Graphs: 
As a preliminary step towards understanding the interplay 
between network topology and security against eaves- 
dropping nodes, we present a rigorous characterization 
of the achievable level of algebraic security for this class 
of complete graphs. 

The remainder of this paper is organized as follows. First, a 
formal problem statement is in Section |^ followed by a de- 
tailed analysis of the algebraic security of Randomized Linear 



Network Coding in Section III In Section IV this analysis is 
carried out specifically for complete directed acyclic graphs. 
The paper concludes with Section 

II. Problem Setup 

We adopt the network model of [2]: we represent the 
network as an acyclic directed graph G = (V, E), where 

V is the set of nodes and E is the set of edges. Edges 
are denoted by round brackets e = {v,v') € E, in which 

V = head(e) and v' — tail(e). The set of edges that end at a 
vertex v G V is denoted by r/(?;) = {e ^ E : head(e) = v}, 
and the in-degree of the vertex is 6i{v) = |r/(u)|; similarly, 
the set of edges originating at a vertex v ^ V is denoted 
by To{v) — {e E E : tail(e) — v}, the out-degree being 
represented by So{v) = \To{v)\. 

Discrete random processes Xi, ...Xk are observable at one 
or more source nodes. To simplify the analysis, we shall 
consider that each network link is free of delays and that 
there are no losses. Moreover, the capacity of each link is 
one bit per unit time, and the random processes Xi have a 
constant entropy rate of one bit per unit time. Edges with 
larger capacities are modelled as parallel edges and sources 
of larger entropy rate are modelled as multiple sources at the 
same node. We shall consider multicast connections as it is 
the most general type of single connection; there are d > 1 
receiver nodes. The objective is to transmit all the source 
processes to each of the receiver nodes. 

In linear network coding, edge e = {v, u) carries the process 
y(e), which is defined below: 



The transfer matrix M describes the relationship between 
an input vector x and an output vector z,z — xM; M = 
A{I — F)^^B^ , where A and B represent, respectively, the 
linear mixings of the input vector and of the output vector, and 
have sizes K y. \E\ and x |£'|. F is the adjacency matrix 
of the directed labelled line graph corresponding to the graph 
G. In this paper we shall not consider matrix B, which only 
refers to the decoding at the receivers. Thus, we shall mainly 
analyse parts of the matrix AG, such that G — {I — F)~^; 
flj and Cj denote column i of A and AG, respectively. We 
define the partial transfer matrix M^^ j-^-j (also called auxiliary 
encoding vector [9]) as the observable matrix at a given node 
V, i.e. the observed matrix formed by the symbols received at 
a node v. This is equivalent to the fraction of the data that an 
intermediate node has access to in a multicast transmission. 

Regarding the coding scheme, we consider the random 
linear network coding scheme introduced in [7]: and thus 
each coefficient of the matrices described above is chosen 
independently and uniformly over all elements of a finite field 
¥g, q = 2". 

Our goal is to evaluate the intrinsic security of random 
linear network coding, in multicast scenarios where all the 
intermediate nodes in the network are potentially malicious 
eavesdroppers. Specifically our threat model assumes that 
intermediate nodes perform the coding operations as outUned 
above, and will try to decode as much data as possible. 



III. Algebraic Security of Random Linear 
Network Coding 

A. Algebraic security 

The Shannon criterion for information-theoretic secu- 
rity [15] corresponds in general terms to a zero mutual 
information between the cypher-text (C) and the original 
message (M), i.e. I{M; G) = 0. This condition implies that 
an attacker must guess < H{M) symbols to be able to 
compromise the data. With network coding, on the other hand, 
if the attacker is capable of guessing M symbols, K — M 
additional observed symbols are required for decoding — by 
noting that each received symbol is a linear combination of 
the K message symbols from the source, we can see that a 
receiver must receive K coded symbols in order to recover one 
message symbol. Thus, as will be shown later, restricted rank 
sets of individual symbols do not translate into immediately 
decodable data with high probability. This notion is illustrated 
in Figure |2] In the scheme shown on top, each intermediate 
node can recover half of the transmitted symbols, whereas in 
the bottom scheme none of the nodes can recover any portion 
of the sent data. 

Definition 1 (Algebraic Security Criterion): The level of 
security provided by random linear network coding is mea- 
sured by the number of symbols that an intermediate node v 
has to guess in order to decode one of the transmitted symbols. 
From a formal point of view. 
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Fig. 2. Example of algebraic security. In the upper scheme data is not 
protected, whereas in the lower scheme nodes 2 and 3 are unable to recover 
any data symbols. 



where Id represents the number of partially diagonalizable 
lines of the matrix (i.e. the number of message symbols that 
can be recovered by Gaussian elimination). 

Notice that the previous definition is equivalent to comput- 
ing the difference between the global rank of the code and 
the local rank in each intermediate node v. Moreover, as more 
and more symbols become compromised of security criteria, 
the level of security tends to 0, since as we shall show in 
this section, with high probability the number of individually 
decodable symbols 1^ goes to zero as the size of the field goes 
to infinity. 

B. Security Characterization 

We are now ready to solve the problem of characterizing 
the algebraic security of random linear network coding. The 
key to our proofs is to analyze the properties of the partial 
transfer matrix at each intermediate node. Recall that there are 
two cases in which the intermediate node can gain access to 
relevant information: (1) when the partial transfer matrix has 
full rank and (2) when the partial transfer matrix has diagonal- 
izable parts. Thus, we shall carry out independent analyzes in 
terms of rank and in terms of partially diagonalizable matrices. 

The following lemmas will be useful. 

Lemma 1: In the random linear network coding scheme, 

P(As > 0) < P{3v : Si{v) > K). 

Proof: See the Appendix. ■ 
It follows from this lemma that it is only necessary to consider 
the case in which K < 6i{v). 

Lemma 2: The probability that a linear combination of 
independent and uniformly distributed values in ¥g yields the 
zero result is bounded by 

where h{q) is a function such that 0{h{q)) < 0{q^). More- 
over, P{Xiin = 0) tends to when q ^ oo. 

Proof: See the Appendix. ■ 
Lemma 3: The probability of obtaining y zeros in one line 
of the ^ X ^ transfer matrix M is bounded by 
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Proof: See the Appendix. ■ 
Theorem 1: Let P{ld > 0) be the probability of recovering 
a strictly positive number of symbols 1^ at the intermediate 
nodes with Sj{v) < K — 1 by Gaussian elimination. Then, 
P{ld > 0) — > with q^ oo and K ^ oo. 

Proof: Let M' be the transpose of the partial transfer 
matrix at some vertex v, M' = Af J , ^ . We consider the 
process of Gaussian elimination of M' . It is unnecessary 
to consider rank K, since in that case the matrix, w.h.p, 
is invertible and hence diagonalizable [8]. Thus, M' is a 
Si{v) X K matrix, Si{v) < K. 

We prove the theorem constructively by analysing the 
probability of having K — 1 zeros in one or more lines of 
M' . Let p be the probability of having K — 1 zeros in a 
line of M', and let X be a random variable representing the 
recoverable number of symbols when an intermediate node 
has Si{v) degrees of freedom. It follows from Lemma [i] that 
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In the base case with 6i{v) — 1, at most X = 1 symbols can 
be recovered, since there are not enough degrees of freedom 
to perform Gaussian elimination and the only chance for 
recovering a symbol is that the line of the matrix M akeady 
has K — 1 zeros. The probability for this is p. 

In the case that 1 < Si{v) < K, we can obtain directly a 
number L = I of lines with K—1 zeros, and a number di{v)~l 
of lines in the opposite situation. Since we have 6j{v) degrees 
of freedom to perform Gaussian elimination, we can obtain at 
most Sj{v) symbols by successive elimination. At each step 
the probability of obtaining a line with K—1 zeros is bounded 
by p. 

By analysing the different possibilities of combinations for 
the lines that already have K—1 zeros and the ones that can 
be obtained by Gaussian elimination, we get 

p{x ^ X) < ^ cvy^' - p)'''^'''p^^^ = ^) 



Pi{X^x) < 



where Pi{X — x) represents P{X = x\L = I). 

Approximating the binomial distribution by a normal distri- 
bution yields 



PiiX^x) 



where 
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^27r{Si{v) - l)p{l - pY 



l{x-{Sj{v)-l)pf 



2{Si{v)-l)p{l-p), 

Since p ^ p* < 1, we can state that, when q ^ oo and p — + 
is w cxp(a;^). When K goes to oo, so does x, and hence 
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Since 

PiX = K-l) = {^'^"^i^-pY'^^^-'PiiX = K-1), 

and Pi{X = K — 1) decreases exponentially, and I only 
increases linearly, 

The probability of obtaining X < K — 1 symbols is bounded 
by P{X = A' — 1); it follows that the probability of decoding 
X symbols with any 6i{v) < K goes to zero as q and K tend 
to infinity. ■ 

IV. Algebraic Security of the Complete Graph 

Notice that, in consequence of the property outlined in 
Lemma [T] the algebraic security of a graph is topology 
dependent. A node with 5i{v) > K will not necessarily 
receive a full-rank partial transfer matrix. The rank depends 
on the available paths between sources and each intermediate 
node. More specifically, depending on the topology of the 
graph, some nodes may receive only combinations of symbols 
derived from matrices with restricted rank, i.e. less than K. 
This includes, for example, trees, where a node connected 
directly to the source by a link of capacity C can only have 
children that receive at most rank C. 

As a first step towards general network models, we consider 
the case of complete acyclic directed graphs G — {V,E), 
n = \V\, which can be generated as follows. 

• Generate random labels for the n vertices. These have 
some ordering {ei, 62, e„} associated to them; 

• Make an outgoing (directed) edge from the vertex with 
the minimum label to every vertex with a higher label; 

• Continue until we reach a vertex where there are no more 
possibilities for connections. 

This algorithm generates a complete acyclic directed graph 
with one source, one sink and \E\ = n{n — l)/2 edges, 
since the total degree of each vertex is n — 1 = 5i{v) + 
Soiy). The source and the sink are naturally determined as 
those nodes that have only outgoing edges or only incoming 
edges, respectively. The ordering ensures that this algorithm 
always generates an acyclic directed graph, conferring the 
graphs generated in this way specific properties such as the 
distribution of the in and out-degrees. These properties can be 
determined directly from the order of the vertex using 5o{v) = 
n — order(v) and 5i{v) — n — Soiv) — 1 = orderiy) — 1. 

Before proving our next theorem, we introduce the follow- 
ing lemmas. 

Lemma 4: In complete acyclic directed graphs, a node that 
receives R symbols, receives w.h.p. a partial transfer matrix 
with rank equal to min(i?, K). 

Proof: See the Appendix. ■ 

Lemma 5: For the complete directed acyclic graph, w.h.p., 

if — minfi^T, orderfw)) 
^s{v) = — . 

Proof: See the Appendix. ■ 



Theorem 2: Let (ps be the secure max-flow, defined as 
the maximum number of symbols that may be secured in 
a transmission by using random linear network coding. For 
a complete acyclic directed graph with n nodes, the secure 
max-flow equals the max-flow min-cut capacity of the network 
and is n — L Conversely, the minimum numbers of required 
symbols for secured transmission is n — 1 symbols. 
Proof: 

Suppose, by contradiction, that K — n — 1 is the max- 
flow min-cut capacity of the complete directed acyclic graph. 
The maximum order of an intermediate node v is n — 2, thus 
by Lemma [5] we have As{v) = — 1). It follows that the 
secure max-flow of the complete acyclic directed graph equals 
the capacity of the graph. 

By contradiction, let the minimum number of required 
symbols for secured transmission be iris < n — 2. There 
exists an intermediate node v such that order(w) = n — 1, 
and consequently, A5(u) = 0. Then the minimum number of 
required symbols for secure transmission is nis = n — L 

■ 

It follows that the way to secure this class of complete 
graphs is to transmit at the max-flow min-cut capacity, if 
necessary by adding "dummy" symbols. 

V. Conclusions 

Intrigued by the security potential inherent to random linear 
network coding, we developed a specific algebraic security 
criterion, for which we proved a set of key properties. Perhaps 
one of the most striking conclusions of our analysis is that 
algebraic security with network coding is very dependent on 
the topology of the network. As an example, we focused on 
complete acyclic directed graphs, and determined the secure 
max-flow, as well as the minimum number of symbols required 
for algebraic security. As part of our ongoing work, we are 
extending this analysis to other more general network models. 
Ultimately, we would like to develop secure communication 
protocols capable of exploiting random linear network coding 
as an almost free cypher 
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Proof of Lemma [2] 

Contrary to the sum, the product of independent and 
uniformly distributed values in Fg is not independent and 
uniformly distributed. In fact, there are two ways to obtain 
a zero in a multiplication in F^: (1) by multiplication between 
an element a G F^ and 0, and (2) by multiplication over two 
elements a G F^ and h e Fg, such that a ^ and b ^ 0, but 
ab = 0. Now, the total number of entries of the multiplicative 
table between q elements of F, is q^, and there are at most 2q 
instances of the first case: q instances of afo = 0, a = and 
6^0, and q instances of a6 = 0, a = and b ^ 0. As for 
the second case, it is possible to prove by contradiction that 
the number of zeros obtained this way is strictly less than q^: 
if this was not the case, all products of elements of F^ would 
be zero, and that is absurd. Since this is true for any q, the 
number of zeros grows 0{h{q)) < 0{q^). Thus, we have 

q 

Since for large enough q we have {2 + h{q))/q < 1, it follows 
that 
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Proof of Lemma [7] 

We will prove this constructively in terms of the ranks of 
parts of the transfer matrix. The auxiliary encoding vector in 
each intermediate node v is given by 



{A{I-F)-') 



where Afp^^.^^ denotes the columns of the matrix correspond- 
ing to the incoming edges of v. The dimension of Afp j^^^ is 
K X Si{v), with 6i{v) < \E\. 

To determine the rank of the partial transfer matrix, we note 
that the transfer matrix M — A{I — F)^^B^ for the network 
must be invertible, and hence, rank(Af) = K. On the other 
hand, to determine the rank of A{I — F)^^ we use the fact 
that (/ - is invertible and thus rank((/ - Fy^)^\E\. 
We also have 

rank(A(/-F)-i) < \E\, 

because the dimension of A{I — F)^^) is if x \E\. But, since 

rank(A(/ - F)-^B^) = K = min(rank(yl(/ - F)-^),B) 

holds and K < \E\ (true because K must be less than the 
minimum cut in the network) we conclude that 

rank(A(/ - F)-^) ^ K. 

We now consider A5(w) at some vertex v. For that, we can 
consider two distinct cases: the first one is if K < 6i{v). In 
this case, we cannot assume anything about As{v), since the 
rank of the matrix Afp^^^^ will be dependent on the topology 
of the network. As for the second case, rank(Afp^^^j) < K =^ 
Asiv) < 0. ' ■ 



Proof of Lemma [i] 

Each position of a line of the transfer matrix Af is a linear 
combination of independently and uniformly chosen values in 
Fq, and thus, the probability of obtaining a zero in a position 
is given by Lemma^ The result follows by considering all the 
combinations of the possible positions in which the Y zeros 
may occur. ■ 

Proof of Lemma |4] 

Suppose that a given intermediate node receives R~K + 9 
symbols, > 0. It is clear that the maximum possible rank is 
K and thus there is a way to remove 9 columns s.t. the rank 
of the resulting set will still be at maximum K. Now consider 
the case in which vertex v receives at most K symbols. If the 
columns are linearly dependent, the condition 

{xh^c^, + Xh;Sho_ + - + 2^'i„c^„ = (0-0)^}, 

such that x/ij , a;/i2, a;/i„not all 0, G Fg and hi,h2,-.-,hn 
represent the columns G Ti{v), will be satisfied. Since the 
linear combination of lines of the transfer matrix is again a 
linear combination of independent and uniformly distributed 
values in F^, it follows from Lemma |i] that the probability of 
obtaining (0...0)^ tends to when q ^ oo and K oo, 
and thus, the columns /ii, /i2, /i„ G Ti{v) are linearly 
independent w.h.p. ■ 

Proof of Lemma |5] 

It follows from Lemma |?]that w.h.p., the number of symbols 
received by a vertex is the rank of the partial transfer matrix 
received (and at most K) and thus 

K — min(i^, (5/(u)) 



As{v) 



K 

K — min(iir, order(u) — 1) 
K 



